High Dimensional Clustering Using Parallel Coordinates and the Grand Tour
نویسندگان
چکیده
In this paper, we present some graphical techniques for cluster analysis of high-dimensional data. Parallel coordinate plots and parallel coordinate density plots are graphical techniques which map multivariate data into a two-dimensional display. The method has some elegant duality properties with ordinary Cartesian plots so that higher-dimensional mathematical structures can be analyzed. Our high interaction software allows for rapid editing of data to remove outliers and isolate clusters by brushing. Our brushing techniques allow not only for hue adjustment, but also for saturation adjustment. Saturation adjustment allows for the handling of comparatively massive data sets by using the -channel of the Silicon Graphics workstation to compensate for heavy overplotting. The grand tour is a generalized rotation of coordinate axes in a high-dimensional space. Coupled with the full-dimensional plots allowed by the parallel coordinate display, these techniques allow the data analyst to explore data which is both high-dimensional and massive in size. In this paper we give a description of both techniques and illustrate their use to do inverse regression and clustering. We have used these techniques to analyze data on the order of 250,000 observations in 8 dimensions. Because the analysis requires the use of color graphics, in the present paper we illustrate the methods with a more modest data set of 3848 observations. Other illustrations are available on our web page.
منابع مشابه
Visual Clustering and Classiication: the Oron- Say Particle Size Data Set Revisited 2.2 Parallel Coordinate Plots 2.3 Linked Low{dimensional Views 3 the Oronsay Particle Size Data Set
Interactive statistical graphics can be eeectively used to nd natural groupings in observations. In this paper we want to demonstrate how clustering and classiication can be done with three approaches based on highly interactive graphical environments: high{dimensional scatterplots as available in XGobi, parallel coordinate plots as available in ExplorN, and linked low{ dimensional views as ava...
متن کاملIncorporating Density Estimationinto Other Exploratory
Preliminary understanding of a new data set is routinely accomplished with graphical tools, such as those popularized originally by EDA. A number of more recent ideas for multivariate data analysis have emerged and some are available in software packages or shareware such as XGobi. In this talk, we illustrate how many of the point-oriented techniques can be supplemented by incorporating nonpara...
متن کاملThe Grand Tour in k-Dimensions1
The grand tour introduced by Asimov (1985) is based on the idea that one method of searching for structure in ddimensional data is to “look at it from all possible angles," more mathematically, to project the data sequentially in to all possible two-planes. The collection of two-planes in a d-dimensional space is called a Grassmannian manifold. A key feature of the grand tour is that the projec...
متن کاملOn Some Mathematics for Visualizing High Dimensional Data
The analysis of high-dimensional data offers a great challenge to the analyst because the human intuition about geometry of high dimensions fails. We have found that a combination of three basic techniques proves to be extraordinarily effective for visualizing large, high-dimensional data sets. Two important methods for visualizing high-dimensional data involve the parallel coordinate system an...
متن کامل3D Grand Tour for Multidimensional Data and Clusters
Grand tour is a method for viewing multidimensional data via linear projections onto a sequence of two dimensional subspaces and then moving continuously from one projection to the next. This paper extends the method to 3D grand tour where projections are made onto three dimensional subspaces. 3D cluster-guided tour is proposed where sequences of projections are determined by cluster centroids....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996